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Hemagglutinin-esterases (HEs) are bimodular envelope proteins 
of orthomyxoviruses, toroviruses, and coronaviruses with a 
carbohydrate-binding "lectin" domain appended to a receptor- 
destroying sialate-O-acetylesterase ("esterase"). In concert, these do¬ 
mains facilitate dynamic virion attachment to cell-surface sialoglycans. 
Most HEs (type I) target 9-O-acetylated sialic acids (9-O-Ac-Sias), but 
one group of coronaviruses switched to using 4-O-Ac-Sias instead 
(type II). This specificity shift required quasisynchronous adaptations 
in the Sia-binding sites of both lectin and esterase domains. Previ¬ 
ously, a partially disordered crystal structure of a type II HE revealed 
how the shift in lectin ligand specificity was achieved. How the switch 
in esterase substrate specificity was realized remained unresolved, 
however. Here, we present a complete structure of a type II HE with 
a receptor analog in the catalytic site and identify the mutations 
underlying the 9-0- to 4-O-Ac-Sia substrate switch. We show that 
(/) common principles pertaining to the stereochemistry of protein- 
carbohydrate interactions were at the core of the transition in lectin 
ligand and esterase substrate specificity; (//) in consequence, the 
switch in O-Ac-Sia specificity could be readily accomplished via con¬ 
vergent intramolecular coevolution with only modest architectural 
changes in lectin and esterase domains; and (///) a single, inconspicuous 
Ala-to-Ser substitution in the catalytic site was key to the emergence 
of the type II HEs. Our findings provide fundamental insights into how 
proteins "see" sugars and how this affects protein and virus evolution. 

coronavirus | hemagglutinin-esterase | sialic acid | crystal structure | 
sialate-O-acetyl esterase 

A mong host cell surface determinants for pathogen adherence, 
sialic acids (Sias) rank prominently (1, 2). Representatives of 
at least 11 families of vertebrate viruses use Sia as primary entry 
receptor and/or attachment factor (3, 4). Viral adherence to sia¬ 
loglycans, however, comes with inherent complexities related to 
(i) the sheer ubiquity of receptor determinants that may act as 
“decoys” when present on off-target cells and non-cell-associated 
glycoconjugates, and («) the dense clustering that is characteristic 
to glycotopes and that may augment the apparent affinity of ligand- 
lectin interactions by orders of magnitude (5, 6). Viruses may avoid 
inadvertent virion binding to nonproductive sites by being selective 
for particular sialoglycan subtypes so that attachment is dependent 
on Sia linkage type, the underlying glycan chain, and/or the absence 
or presence of specific postsynthetic Sia modifications (2, 7, 8). 
Moreover, as an apparent strategy to evade irremediable binding to 
decoy receptors, viral sialolectins typically are of low affinity, with 
dissociation constants in the millimolar range (reviewed in ref. 3). 
In consequence, virion-Sia interactions are intrinsically dynamic 
and the affinity of the virolectins would appear to be fine-tuned 
such as to ensure reversibility of virion attachment. In most viruses, 
reversibility is exclusively subject to the lectin-ligand binding 
equilibrium. Some, however, take this principle one step further by 


encoding virion-associated enzymes to promote catalytic virion 
elution through progressive local receptor depletion (3, 4). 

In lineage A betacoronaviruses (A-pCoVs), a group of envel¬ 
oped positive-strand RNA viruses of human clinical and veterinary 
relevance (9), catalysis-driven reversible binding to O-acetylated 
Sias (O-Ac-Sias) is mediated by the hemagglutinin-esterase (HE), 
a homodimeric type I envelope glycoprotein (10-15). HE mono¬ 
mers resemble cellular carbohydrate-modifying proteins (16, 17), 
in that they have a bimodular structure with a lectin appended to 
the enzyme domain. The lectin domain mediates virion attach¬ 
ment to specific O-Ac-Sia subtypes with binding hinging on the all- 
important sialate-O-acetyl moiety, whereas removal of this 
O-acetyl by the catalytic sialate-O-acetylesterase (“esterase”) do¬ 
main results in receptor destruction (18-21). 

Intriguingly, HE homologs also occur in toroviruses (22-25) as 
well as in three genera of orthomyxoviruses {Influenza virus C, 
Influenza virus D , and Isavirus ) (26-32), but, among coronavi¬ 
ruses, exclusively in A-pCoVs (9). HE was added to the proteome 
of an A-pCoV common progenitor through horizontal gene 
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transfer and apparently originated from a 9-O-Ac-Sia-specific 
hemagglutinin-esterase fusion protein (HEF) resembling those 
of influenza viruses C and D (10, 19). The acquisition of HE, 
either or not in conjunction with that of other accessory proteins 
like ns2a (33), may well have sparked the radiation of the 
A-pCoVs. At any rate, their expansion through cross-species 
transmission was accompanied by evolution of HE, apparently 
reflecting viral adaptation to the sialoglycomes of the novel hosts 
(14). For example, the HE of bovine coronavirus (BCoV) pref¬ 
erentially targets 7,9-di-O-Ac-Sias, a trait shared with the HEs of 
bovine toroviruses (8, 24). The most dramatic switch in O-Ac-Sia 
specificity occurred in the murine coronaviruses (MuCoVs), a 
species of A-pCoVs in mice and rats (9). Two MuCoV biotypes can 
be distinguished on the basis of their FIE (14) with one group of 
viruses using the prototypical attachment factor, 9-O-Ac-Sia (type I 
specificity) (24), and the other exclusively binding to Sias that are 
O-acetylated at carbon atom C4 (4-O-Ac-Sia) (type II specificity) 
(15, 24, 34, 35). Although deceptively similar in nomenclature 
and acronyms, 9-0- and 4-O-Ac-Sias are quite different in 
structure (Fig. L4), particularly when taking into account that 
the sialate-O-acetyl is paramount to protein recognition. Thus, in 
molecular terms, the shift in ligand/substrate preference would 
seem momentous. As rules of virus evolution would predict, and 
in accordance with the phylogenetic record (14), the transitions 
in ligand and substrate specificity that required coevolution of 
two distinct protein domains (i.e., lectin and esterase) must have 
occurred swiftly and, although not necessarily simultaneously, at 
least within a narrow time frame. 

Previous analysis of an HE structure of MuCoV type II strain 
S revealed how the shift in Sia specificity was accomplished for the 
lectin domain (21). Its comparison with the (type I) HE of BCoV [a 
member of species Betacoronavinis-1 distantly related to MuCoV 
(9)] allowed for a rough reconstruction of the remodeling of the 
lectin’s carbohydrate binding site (CBS). The catalytic site, how¬ 
ever, was disordered (21), and hence the question of how the switch 
in substrate specificity was brought about remains unresolved. We 
now present fully resolved crystal structures of a type II HE, free or 
with ligand/substrate analogs in the Sia binding sites of both lectin 
and esterase domain. To allow for a minute side-by-side compari¬ 
son, we also determined the structure of the esterase domain of a 
closely related type I MuCoV HE. Comparative structural analysis 
corroborated by structure-guided mutagenesis revealed the crucial 
changes that underlie the substrate specificity switch and thus 
established the structural basis for type II substrate selection. Our 
findings indicate that basic principles pertaining to the stereo¬ 
chemistry of protein-carbohydrate interactions were at the core of 
the transition in lectin ligand and esterase substrate specificity. We 
propose that, within this context, a single inconspicuous amino acid 
substitution in the catalytic site—in essence, the mere introduction 
of an oxygen atom—was key to the emergence of the type II HEs. 

Results and Discussion 

Structure Determination and Overall Structures. The HE ectodo- 
mains of murine coronavirus strains MHV-DVIM (type I) and 
RCoV-NJ (type II), either intact or rendered catalytically inactive 
through active-site Ser-to-Ala substitutions (HE 0 ), were expressed 
as thrombin-cleavable Fc fusion proteins. The expression products 
retained full biological activity as was demonstrated by solid-phase 
lectin-binding assays and receptor destruction assays with bovine 
submaxillary mucin (BSM) and horse serum glycoproteins (HSGs) 
(Fig. IB); these sialoglycoconjugates carry 9-O-Ac- and 4-O-Ac-Sias, 
respectively (36, 37), and were used to assess esterase specificity 
throughout. 

Crystals of MHV-DVIM HE, and of RCoV-NJ HE 0 , free or 
in complex with the nonhydrolysable ligand/substrate analog 
4,5-di-7V-acetylneuraminic acid a-methylglycoside (a-4-V-Ac-Sia), 
diffracted to 2.0, 2.2, and 1.85 A, respectively. Structures were 
solved by molecular replacement using MHV-S HE [Protein Data 






O MHV-DVIM □ RCoV-NJ 



Fig. 1. (A) Stick representation of 9-O-Ac-Sia and 4-O-Ac-Sia. O-Ac moieties 
are depicted with carbon atoms in cyan. (B) Substrate specificity of MHV-DVIM 
HE (red circles) and RCoV-NJ HE (blue squares). BSM (Left) and HSG (Right) 
were coated in MaxiSorb plates and incubated with twofold serial dilutions 
(starting at 100 ng/pL) of enzymatically active HE-Fc fusion proteins. Loss of 
4-0- and 9-O-Ac-Sias (indicated by percentual depletion on the y axis) was 
assessed by solid-phase lectin-binding assay with enzymatically inactive viro- 
lectins MHV-S HE°-Fc and PToV-P4 HE°-Fc, respectively, with virolectin con¬ 
centrations fixed at 50% maximal binding. (C) Cartoon representation of the 
crystal structures of the RCoV-NJ HE and MHV-DVIM HE dimers. The Left 
monomer is colored gray, the other by domain: lectin domain (L, blue); 
esterase domain (E, green) with Ser-His-Asp active site triad (cyan sticks); 
membrane proximal domain (red). 


Bank (PDB) ID code 4C7W; for RCoV-NJ HE] and BCoV-Mebus 
HE (PDB ID code 3CL5; for MHV-DVIM HE) as search models. 
For crystallographic details, see Table 1. 

Overall, the murine coronavirus HEs closely resemble those of 
other nidoviruses, assembling into homodimers and with mono¬ 
mers displaying the characteristic domain organization (Fig. 1C) 
(19, 20). For RCoV-NJ HE, complete structures were determined. 
In the case of MHV-DVIM HE, the lectin domain was partially 
disordered, but the structure of the esterase domain was resolved. 

The lectin domain of RCoV-NJ HE is virtually identical to that 
of (type II) MHV-S HE (21) (rmsd on main chain Ca atoms: 0.31 A; 
Table SI; for a sequence alignment of representative type I and 
II HEs, see Fig. SI). The same holds for the binding mechanism 
and topology of the ligand (Figs. S24 and S3B). One notable 
difference is in the lectin domain’s metal-binding site, a signature 
element of coronavirus HEs (19). That of RCoV-NJ contains Na + 
rather than K + as inferred from the bond lengths to the co¬ 
ordinating amino acids (Asp 225 , Ser 226 , Gin 227 , Ser 273 , Glu 275 , and 
Leu 277 ), B factors, and abundance in crystallization solution (Tables 
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Table 1. Data collection and refinement statistics 

Data collection 


and refinement 

MHV-DVIM HE 

RCoV-NJ HE free 

RCoV-NJ HE complex 


Data collection 

Synchrotron 

ESRF 

SLS 

ESRF 


Beamline 

ID23-1 

PX 

ID23-2 


Wavelength, A 

0.9999 

0.9999 

0.8729 


Space group 

P2,2,2i 

C222, 

C222, 


Cell dimensions 
a, b, c, A 

88.52, 88.82, 122.16 

60.71, 184.37, 76.90 

57.09, 184.59, 78.08 


a, P, y, ° 

90, 90, 90 

90, 90, 90 

90, 90, 90 


Resolution range. A* 

44.41-2.00 (2.03-2.00) 

61.2-2.2 (2.27-2.20) 

54.54-1.85 (1.89-1.85) 


Total no. reflections 

601,769 (20257) 

92,318 (8952) 

107,080 (5853) 


No. unique reflections 

65,139 (2878) 

22,281 (2049) 

33,539 (2066) 


Emerge 

0.096 (1.184) 

0.109 (0.68) 

0.106 (0.519) 


llal 

12.5 (2.3) 

8.5 (2.2) 

6.2 (1.8) 


Redundancy 

9.24 (7.0) 

4.1 (4.4) 

3.2 (2.8) 


Completeness, % 

99.2 (90.9) 

99.6 (100) 

94.4 (95.7) 


CC(1/2) 

0.999 (0.747) 

0.995 (0.815) 

0.990 (0.577) 


Refinement 

^work^free 

0.1990/0.2264 

0.2319/0.2783 

0.1851/0.2006 


No. atoms 

Protein 

5,708 

2,929 

3,058 

>- 

13 

Water/other ligands 

223/463 

89/86 

186/182 

O 

Average B/Wilson B, A 2 

52.0/42.5 

40.99/25.4 

12.5/22.5 

O 

CO 

Rms deviations 

Bond lengths, A 

0.018 

0.0094 

0.007 

O 

tc 

u 

2 

Bond angles, ° 

1.949 

0.9254 

1.300 


Ramachandran plot 
Favored, % 

96.6 

95.0 

97.0 


Allowed, % 

3.4 

5.0 

3.0 


Outliers, % 

0 

0 

0 



‘Numbers between brackets refer to the outer resolution shell. 


S2 and S3). Its structure, however, is fully conserved, with all key 
residues in RCoV-NJ HE aligning with those in MHV-S HE (Fig. 
S2 B). It would thus appear that the metal-binding site in the type II 
lectin domain can be occupied by either Na + or K + without major 
consequences for protein structure and function. 

The esterase domain of the RCoV-NJ HE is strikingly similar to 
those of MHV-DVIM and BCoV-Mebus HE (Fig. SI; rmsd of 
0.25 A on main-chain Ca atoms for all three combinations, Table 
SI), despite the difference in substrate specificity. As was predicted 
from primaiy sequence similarity [66% identity overall between 
MHV-DVIM and RCoV-NJ HE and 70% in the esterase domain 
(14)] and confirmed by present structural data, the shift in sub¬ 
strate-specificity from 9- to 4-O-Ac-Sia required minimal archi¬ 
tectural changes. A crystal structure of RCoV-NJ HE 0 complexed 
with 4,5-di-V-Ac-Sia was obtained by soaking at high Sia concen¬ 
trations (100 mM) and low temperature (4 °C) to allow for the 
stabilization of low-affinity interactions. The electron density map 
revealed a well-defined substrate analog molecule (Fig. S3) bound 
in the active site. 

All Elements of the Ancestral Type I Catalytic Center Are Conserved in 
Sia-4-O-Ac-Specific Type II HEs. The nidoviral and orthomyxoviral 
esterase domains form a separate family in the c.23.10 Ser- 
Gly-Asn-His (SGNH) superfamily of esterases and acetylhy- 
drolases (18, 38). These enzymes are characterized by an afla 
domain organization with a central five-stranded parallel p-sheet, 
and by strict topological conservation of catalytic SGNH residues 
(Fig. 24). As illustrated in Fig. 2 B for MHV-DVIM HE, the Ser 
and His residues, together with Asp form a catalytic triad, arranged 
in a linear array. Flanking the catalytic triad is a hydrophobic 
specificity pocket (PI) to accommodate—in O-acetylesterases—the 


methyl group of the target Sia-O-acetylate. The conserved Gly and 
Asn residues located along the upper rim of this pocket contribute 
through main-chain and side-chain amides, respectively, to create 
an oxyanion hole in combination with the main-chain amide of the 
active site Ser (Fig. 2 A, C, and D) (18, 39, 40). 

The viral esterase domains differ from other SGNH hydrolases 
by the presence of a second hydrophobic pocket (P2) on the 
opposite side of the catalytic triad (18-21). In sialate-9-O- 
acetylesterases, this pocket serves to harbor the (hydroxy)methyl 
group of the Sia-5-V-acyl moiety (18). Another hallmark is a 
strategically positioned Arg (Arg 305 in DVIM HE), the side chain 
of which extends into the catalytic center (Fig. 2 A, C, and D). 
Although not essential for catalysis per se, this Arg is of overriding 
importance for substrate binding and, in consequence, for the 
efficient cleavage of glycosidically bound 9-O-Ac-Sias (20). Its side 
chain’s head group engages in a bidentate hydrogen bond in¬ 
teraction with the Sia-carboxylate (18, 39), thus fixing the Sia 
pyranose ring in a proper orientation such that the Sia-9-O-acetyl 
is brought in close proximity of the active-site nucleophile. As we 
observed for torovirus type I HEs (20), substitution in DVIM HE 
of Arg 305 by Ala abrogates enzymatic activity toward natural 
substrates (Fig. 2 E), but does not affect cleavage of the synthetic 
substrate p-nitrophenyl acetate (pNPA) (Fig. 2 F). 

Remarkably, all elements of the ancestral/archetypical Sia-9-O- 
AE catalytic center, including PI and P2 pockets, are present in 
Sia-4-O-Ac-specific MuCoV type II HEs, with a near-perfect 
alignment in MHV-DVIM and RCoV-NJ HEs of all residues 
known to control sialate-9-O-acetylesterase activity (Fig. 24). 
With the enzymatic mechanism and all main structural elements 
for catalysis preserved, a shift in esterase specificity from 9- to 4-0- 
acetylated Sias could only have been effectuated by changing the 
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Fig. 2. (A) Superposition of residues lining the PI pocket of Influenza C HEF (carbon atoms, cyan), MHV-DVIM HE (carbon atoms, green), and RCoV-NJ HE 
(carbon atoms, salmon). Surface representation is that of MHV-DVIM HE. Conserved residues within the SGNH family of hydrolases are underlined. (6) Surface 
representation of the catalytic center of MHV-DVIM HE with the PI and P2 pockets indicated. The Ser-His-Asp catalytic triad is shown as sticks. (C) 9-/V-Ac-Sia 
binding in the HEF catalytic site as observed in the crystal complex (18). Contacting amino acid side chains are shown in stick representation and colored by 
atom type (oxygen, red; nitrogen, blue; carbons, gray or green for amino acid side chains and 9-/V-Ac-Sia, respectively). Oxyanion hole hydrogen bonds and 
the bidentate hydrogen bond interaction between Arg 305 and the Sia carboxylate moiety are shown as black, dashed lines. (D) Model of 9-/V-Ac-Sia binding in 
the MHV-DVIM HE catalytic site based on superposition with the HEF-inhibitor complex (carbon atoms, green) and on automated molecular docking (carbon 
atoms, salmon), represented as in Fig. 2C. (£) Catalytic activity of MHV-DVIM HE toward glycosidicaIly bound 9-O-Ac-Sia is abrogated by substitution of Arg 305 
by Ala. Receptor destruction was assessed as in Fig. IS. ( F ) Arg 305 Ala substitution in MHV-DVIM HE does not affect activity toward the synthetic substrate 
pNPA. Ser 44 Ala is a catalytically inactive mutant. Enzymatic activity shown as percentage of wild-type activity. 


binding topology of the substrate. As shown by the data, this is 
indeed what occurred (Fig. 3 A). Compared with 9-lV-Ac-Sia 
bound in the type I catalytic center of HEF (Fig. 2C) and to 
9-O-Ac-Sia in the esterase site of MHV-DVIM HE as modeled 
by superposition or automated docking (Fig. 2D), the 4-V-Ac-Sia- 
substrate analog in the RCoV-NJ type II enzyme is rotated by 180° 
about the central Sia C2-C5 axis allowing the 4-V-acetyl moiety to 
be inserted into the PI pocket while the 5-/V-acyl remains in pocket 
P2 (Fig. 3 A-C). Moreover, the substrate molecule is tilted by 20° 
such as to allow for sufficient space for the remaining sugar resi¬ 
dues of the glycan chain to which the natural substrate, 4-O-Ac-Sia, 
would be attached. 


As a corollary of the altered substrate topology, the catalytic site 
Arg, critical in type I HEs (20), can no longer interact with the Sia 
carboxylate. In accordance, substitution of RCoV HE Arg 307 by 
Ala caused only a minor reduction in sialate-4-O-acetylesterase 
activity (Fig. 3 D and E). Thus, in type II HEs, the catalytic center 
Arg, although conserved, has become functionally redundant and 
is no longer essential for substrate binding. 

Type 11-Specific Amino Acid Substitutions Responsible for 4-O-Ac-Sia 
Substrate Specificity Revealed by Mutational Analysis. From the type 
II HE structure, it was not immediately evident how the shift in 
substrate specificity was achieved and how binding of the original 
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Fig. 3. (A) Surface representation of the MHV-DVIM HE (/.eft) and RCoV-NJ 
HE (Right) catalytic sites in complex with 9-O-Ac-Sia [docked with Autodock4 
(55)] and 4-/V-Ac-Sia (crystal complex), respectively. (8) Surface representation 
of the catalytic sites of MHV-DVIM HE (Left) and RCoV-NJ HE (Right). The active- 
site Ser 44 in MHV-DVIM HE already adopts the "active" rotamer observed in 
HEF (18, 39, 40); For RCoV-NJ HE crystallized as an inactive Ser-to-Ala mutant, 
a Ser side chain with active rotamer was introduced using COOT. The PI and P2 
pockets are highlighted by dashed circles; approximate distances between 
pockets, as measured from the centers, are indicated. (C) Binding topology of 
aNeu4,5,9Ac 3 in type I (Left) and type II (Right) esterases. The PI and P2 pockets 
accommodating the O- and W-acetyl moieties are shown schematically. 
aNeu4,5,9Ac 3 is shown in stick representation and colored as in Fig. 2C. As¬ 
terisks indicate the position of the 0 2 atom through which Sias are glycosidi- 
cally linked. The distances between 5 -N- and 9-0- or 4-O-Ac methyl groups are 
shown. (D) RCoV-NJ HE Arg 307 is not essential for sialate-4-O-acetylesterase 
activity. Ser 40 Ala is a catalytically inactive mutant. Receptor destruction was 
assessed as in Fig. 18. For a comparison with type I HEs, see Fig. 2E. (E) Arg 307 Ala 
substitution in RCoV-NJ HE does not affect activity toward the synthetic 
substrate pNPA. Enzymatic activity shown as percentage of wild-type activity. 
(F) Hydrogen bonding of the sialate-5-W-acyl carbonyl oxygen and amide ni¬ 
trogen with RCoV-NJ HE Ser 74 and His 336 , respectively, as observed in the 
crystal complex, indicated as in Fig. 2C. Hydrophobic contacts between Tyr 46 
and the Sia-5-W-acyl methyl group are shown as thin gray lines. 


substrate is excluded. We therefore performed comprehensive 
comparative sequence analysis of all type I and II coronavirus 
HE esterase domains available in GenBank to identify consistent 
differences related to substrate specificity (for the nomenclature 
of protein segments, see Fig. S4; for an alignment of representa¬ 
tive type I and type II HEs, see Fig. SI). Only a select number of 
such dissimilarities were noted involving three distinct elements 
(Fig. 4 A and B). In segment p2o/3, proximal to the P2 pocket, 
there is a single type-specific amino acid difference: Ala in type I 
and Ser in all type II HEs. Far more prominent changes occurred 
in segment alp2, which comprises a surface-exposed disulfide 
loop (formed by Cys 44 and Cys 65 or Cys 48 and Cys 69 in RCoV- 
NJ HE and MHV-DVIM HE, respectively) with 16 out of 20 res¬ 
idues (80%) uniquely substituted in type II HEs. The other type 
Il-specific differences are in segment pl6a6, entailing a single¬ 
residue insertion and the substitution of the orthologs of DVIM 
HE Val 332 -Tyr 333 by Asp-Thr-His (Fig. 4 A and B ). Apparently, 
the changes that occurred in segments aip2 and pl6a6 are in¬ 
terrelated as they resulted, among others, in the creation of a 
novel metal-binding site, located near the active site and 
formed by the side chains of aip2 residues Glu 48 , His 52 , Asp 56 , 
and pi6a6 residue His 336 (Fig. 4C). The presence of two neg¬ 
atively charged coordinating residues indicates that the site is 
occupied by a bivalent metal ion, which we identified as Zn 2+ 
on the basis of (/) distances to coordinating amino acids (Tables 
S4 and S5), (ii) coordination by two acidic residues and two 
imidazole rings, and (iii) X-ray absorption data (Fig. S5). Ap¬ 
ropos, loss of this metal ion, caused by the low pH crystalliza- s 
tion conditions may well have caused the disorder of the 
catalytic domain in the published structure of MHV-S HE (21). 

The introduction of the three type I elements of DVIM HE into 
the RCoV-NJ HE background resulted in an esterase with strict 
type I substrate specificity (Fig. 40). The recombinant protein lost 
all enzymatic activity toward 4-O-Ac-Sias and, compared with the 
naturally occurring type I HE of MuCoV strain DVIM, even dis¬ 
played a 12-fold higher sialate-9-O-acefylesterase activity. We pos¬ 
tulate that, in its esterase domain, the NJ/DVIM type I chimera is a 
facsimile, or at the least a close approximation, of the most recent 
common ancestor of the type II HEs (i.e., of the parental HE that 
still retained the original, type I specificity for 9-O-Ac-Sias). 
Departing from this perspective, we asked what the importance of 
the changes in the individual elements might be, and what the 
minimal requirements for the ancestral type I enzyme would have 
been to gain 4-O-acetylesterase activity and to exclude the original 
(type I) substrate. To this end, we systematically placed back the 
type II elements into the type I chimera either individually or in 
combination (Fig. 4 E). Separate reintroduction of the type II aip2 
Cys loop or the pi6o/6 segment did not result in renewed activity 
toward 4-O-Ac-Sia, but in either case, sialate-9-O-acetylesterase 
activity was reduced significantly, i.e., by 92% (pl6a6) or even more, 
to below detection levels (aip2). Apparently, the type Il-specific 
mutations in either of these two segments perturb the binding of 
the original type I substrate. Conversely, introduction of the single 
p2a3 Ala 74 Ser mutation in the type I chimera produced a hybrid 
enzyme that retained most of its sialate-9-O-acefylesterase activity, 
but that now also accepted 4-O-Ac-Sia as a substrate. However, as the 
type II activity is only 25% of that of RCoV-NJ HE, the Ala-to-Ser 
substitution would not have sufficed to confer full 4-O-AE activity. 
Importantly, combinations of the Ala 78 Ser substitution with either 
the type II pi6a6 segment or the aip2 Cys-loop did not have an 
additive effect on the cleavage of 4-O-Ac-Sias. Actually, the latter 
two segments are only functional in unison, as their combination 
gave an enzyme that cleaved 4-O-Ac-Sias, albeit very inefficiently 
(~5% of the activity of RCoV-NJ HE; Fig. 4 E). Apparently, con¬ 
tribution of the pi6a6 and aip2 segments to type II esterase activity 
critically relies on formation of the novel intersegment metal¬ 
binding site. Indeed, single substitutions introduced into RCoV-NJ 
HE to disrupt metal binding (Glu 48 Gln or Asp 56 Asn) reduced 
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Fig. 4. ( A) Partial sequence alignment of MHV-DVIM and RCoV-NJ HE, 
highlighting consistent differences between type I and type II HEs (Fig. SI). 
Aligned sequences, with residue numbering presented Left and Right, cover 
the «1|32-cysteine-loop, the (!2a3 segment (single Ala 78 Ser substitution), and 
the |316a6 segment. Catalytic residues (Ser, Asp, His) are marked with as¬ 
terisks. (S) Overlay of cartoon representations of the active-site regions of 
MHV-DVIM HE (gray) and RCoV-NJ HE (blue). Side chains of catalytic triad 
residues are depicted as sticks. The three type I/ll distinctive elements are 
colored as in A. (C) Cartoon representation of the novel metal-binding site 
near the RCoV-NJ HE active site, formed by Glu 48 , His 52 , Asp 56 , and His 336 . 
The catalytic triad is shown for reference. Side chains are depicted as sticks, 
the Zn 2+ ion as a gray sphere. (D) A type II HE converted into a type I enzyme. 
An RCoV HE-based chimera with all three type I/ll distinctive elements 
replaced by those of MHV-DVIM displays strict sialate-9-O-acetylesterase 
activity. The enzyme activity of the recombinant protein ("Type I chimera") 
was compared with that of the parental proteins (MHV-DVIM and RCoV-NJ 


sialate-4-Oacetylesterase activity to 25%, i.e., the amount of type II 
activity that would be conferred by the Ala 74 Ser substitution alone 
(Fig. 4 F). From the combined findings, we conclude (/') that, during 
MuCoV evolution, the conversion of a type I HE into an enzyme 
with dual (type I and type II) specificity would have required a 
single Ala-to-Ser mutation; (ii) that, for this enzyme to have 
gained full 4-O-AE activity, the type Il-specific changes in all three 
elements were necessary; and (Hi) that the definitive shift in sub¬ 
strate specificity, i.e., the exclusion of the original type I substrate 
9-O-Ac-Sia, must be attributed to the changes in the pi6a6 and 
aip2 segments. 

Type 11-Specific Substitutions: Structural Consequences for Substrate 
Binding. The consequences of the type Il-specific amino acid sub¬ 
stitutions become clear when they are considered in the context of 
the crystal structures of the type I and II HE esterase domains. The 
type Il-specific mutations all affected the P2 pocket, virtually 
causing the pocket to shift by 2.8 A along the ridge, formed by the 
catalytic triad, thus reducing the distance between the PI and P2 
pockets from 7 A in MHV-DVIM HE [and all other orthomyx¬ 
ovirus, torovirus, and coronavirus type I HEs (18-20)] to 6 A in 
RCoV-NJ HE (Fig. 3 B). In reality, the original P2 pocket was lost 
and a new one created. Within the aip2 Cys loop, His 50 in DVIM 
HE was replaced by Tyr, the aromatic side chain of which is rotated 
by 20° (compared with that of DVIM HE His 50 ), opening a novel 
pocket of which it forms one side. Ser 74 of RCoV-NJ HE, the 
ortholog of which in DVIM HE is at the periphery of the catalytic 
center, now forms an adjacent side of the P2 pocket. His 336 in the 
pl6a6 segment, replacing Tyr 333 in DVIM HE, is pushed deeper 
into the catalytic center as a result of the type Il-specific insertion 
of Asp 334 , and locked in position by metal coordination. Its side 
chain compared with that of DVIM Tyr 333 is rotated by 35°, thus 
walling off the type II P2 pocket (Fig. 3 A). The structure of the 
esterase-ligand complex provides an attractive explanation for the 
importance of the type Il-specific changes in segment (316a6 and 
the p2a3 Ala-to-Ser substitution, as in RCoV-NJ HE, His 335 and 
Ser 74 are ideally positioned for hydrogen bonding with the sialate- 
5-A-acyl carbonyl and -amide, respectively (Fig. 3 F). Additionally, 
an important role is suggested for Tyr 46 in the a 1(12 Cys loop as it 
can form extensive hydrophobic contacts with the sialate-5-V-acyl 
methyl group (Fig. 3 F). We propose that these new polar and 
hydrophobic interactions compensate for the loss of the Arg/sia- 
late-carboxylate double-hydrogen bond interaction crucial to sia- 
late-9-O-acetylesterases and contribute to substrate binding in type 
II HEs by stabilizing 4-O-Ac-Sia in proper orientation in the 
catalytic center. 


HE) on BSM (Left) and HSG (Right). Cleavage of 9-0- and 4-O-Ac-Sias was 
assessed as in Fig. 1 B, but now starting at 10 ng/|xl_. (£) Contribution of the 
three type I/ll distinctive elements to esterase activity and substrate speci¬ 
ficity. The type I chimera was subjected to mutational analysis entailing 
systematic reintroduction of RCoV-NJ segments. Esterase activities of chi¬ 
meric proteins toward 9-O-Ac- (blue bars) and 4-O-Ac-Sias (red bars) were 
determined in twofold dilution series as in Fig. 16. Data are shown as per¬ 
centages of specific esterase activity, calculated at 50% receptor depletion, 
relative to that of the type I chimera (for 9-O-Ac-Sia) or of wild-type RCoV-NJ 
HE (for 4-O-Ac-Sia). The error bars represent the SD over six measurements 
(two biological replicates, each of which performed in technical triplicates). 
(F) The type II esterase metal-binding site is required for full 4-O-AE activity. 
Note that disruption of metal binding by either Glu 48 Gln or Asp 56 Asn sub¬ 
stitution reduces sialate-4-O-acetylesterase activity by 75% (comparable to 
the amount of type II activity conferred by the Ala 74 Ser substitution alone). 
Enzymatic activity measured as in Fig. 16 and presented as in Fig. 4£. (G) 4-0- 
and 9-O-Ac-Sias are abundantly expressed in the mouse colon. Paraffin- 
embedded mouse colon tissue sections were stained for 4-O-Ac-Sia with 
MHV-S HE°-Fc, and for 9-O-Ac-Sia with PToV-P4 HE°-Fc. 
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A Single Ala-to-Ser Amino Acid Substitution Was Key to the Emergence 
of Type II HEs. The shift in ligand/substrate preference in the 
MuCoV HE proteins required coevolution of two distinct do¬ 
mains, and, at first glance, the odds of this happening would 
seem remote. Clearly, the order in which the different events took 
place cannot be established, i.e., it is unknown whether a shift in 
lectin ligand specificity occurred first with a shift in esterase sub¬ 
strate specificity following suit or vice versa. In either scenario, 
however, the single substitution Ala-to-Ser in the p2a3 segment 
would have been key. It is quite possible that the initial change 
occurred in the lectin domain through mutations that allowed 
chance low-affinity virion binding to 4-O-Ac-Sias. However, without 
an esterase domain to support catalysis-driven virion release from 
the new ligand, mutant viruses would have been fully dependent on 
the kinetics of the lectin-ligand binding equilibrium for reversibility 
of attachment. Thus, even with the lectin domain taking the lead, 
the novel receptor specificity might only have presented a viable 
evolutionary alternative for the parental type I binding, because of 
the fact that an enzyme with sialate-4-O-acetylesterase activity 
could arise through a single amino acid substitution. In the reverse 
scenario, a single mutation in the esterase domain, resulting in a 
promiscuous enzyme that retained parental substrate specificity, but 
with the capacity to also cleave 4-O-Ac-Sias, might have set the 
stage for the changes in the lectin domain to occur, leading to a 
shift in ligand specificity. Be it as it may, at least the order of 
changes in the esterase domain itself can be understood. Conceiv¬ 
ably, the single p2a3 Ala-to-Ser substitution would have allowed 
further evolution toward optimal activity and substrate specificity of 
the enzyme. In this view, an inconspicuous point mutation opened 
the window of opportunity for the far more extensive, in¬ 
terdependent adaptations in the alp2 and pi6a6 segments to occur. 

The Type II HE Receptor Switch Explained from the Stereochemistry of 
Protein-Carbohydrate Interactions. Specific recognition of sugars by 
proteins is subject to intricacies connected with carbohydrate 
structure and stereochemistry (41, 42). “Simple” monosaccharides 
like galactose and mannose offer few functional groups. Their hy¬ 
droxyl moieties, constituting the principal binding partners in 
carbohydrate-protein interaction sites, are engaged in complex 
interaction networks involving direct or water-mediated hydrogen 
bonds and, often, metal ion coordination (41, 43). As such inter¬ 
actions commonly involve pairs of adjacent hydroxyls, the spatial 
arrangement of the two OH groups is imposed on the architecture 
of the CBS. With any such constellation not being unique to one 
particular monosaccharide, selection of the proper ligand and ex¬ 
clusion of closely related sugars requires additional specific inter¬ 
actions (43). On the flip side, this binding strategy confers a 
remarkable versatility such that with modest changes in protein 
structure through preservation of the geometry of the crucial hy¬ 
drogen and coordinating bonds, the CBS can be adapted to fit al¬ 
ternative ligands and ligand topologies (41^18) (Fig. S6). Sias 
possess a large number of accessible functional groups (carboxylate, 
5-A-acyl, the hydroxyls or substitutions thereof at ring atom C4 and 
at glycerol side chain atoms C7, C8, and C9), which, as argued by 
Neu et al. (3), should allow an “unparalleled number” of sugar- 
protein interactions. Although this is true, our findings described 
here and elsewhere (19-21) suggest that, for biomolecular recog¬ 
nition of 9- and 4-O-acetylated Sias, the same basic principles apply 
as were established for less complex monosaccharides. The shift in 
esterase substrate from 9- to 4-O-Ac-Sias was accomplished not 
through radical changes in protein architecture, but by altering li¬ 
gand binding topology in the context of a largely conserved 
CBS. This was possible on account of ( i ) the fortuitous ste¬ 
reochemical similarity between 4-0- and 9-O-Ac-Sias with the 
9-0- and 4-O-Ac moieties positioned at similar angles and roughly 
similar distances with respect to the central 5-IV-acyl; and (ii) a 
recurring mechanism of protein binding to O-Ac-Sias, involving the 
recognition of pairs of identical functional groups (Ac-moieties) 


based on shape complementarity, with the 5 -N- and O-Ac-methyls 
docking into hydrophobic pockets astride of an intercalating 
aromatic amino acid side chain (19-21). The adaptations in the 
type II HE esterase are in fact analogous to those that took place 
in the corresponding type II lectin domain (21). In either case, 
the ancestral type I CBS was modified as to reduce the distance 
between O- and N -Ac docking sites to accommodate for the shorter 
distance between the sialate-4-O- and -5-A-acyl groups (6 A, versus 
7 A for that between the sialate-9-O- and -5-A-acyls). In this sense, 
the reciprocal changes that occurred in the type II lectin and es¬ 
terase domains to adjust ligand and substrate specificity present a 
singular case of convergent intramolecular coevolution. 

HE Receptor Switching: Virus Evolution Driven by Sialoglycan 
Diversity Among Hosts and Tissues? The mere occurrence of the 
type II MuCoV biotype implies that the shift to using 4-O-Ac-Sias 
for virion attachment resulted in a gain in viral fitness. Although we 
now understand in structural terms how the transition in ligand/ 
substrate specificity occurred, it remains an open question what 
biological conditions triggered the emergence of the type II HEs 
and favored their selection. Both 9- and 4-O-Ac-Sias are abundant 
in the murine gastrointestinal tract, particularly in the colon (Fig. 

4G) (8), and the cocirculation of type I and II MuCoVs in nature 
indicates that in principle either type of Sia can serve as attachment g 

factor. There may be differences, however, in expression levels and/ 2 

or in tissue and cell distribution between 9- and 4-O-Ac-Sias— I 

subtle or less subtle—that so far have gone unnoticed, and that § 

were yet of decisive importance. Saliently, of 27 strains in the I 

species Murine coronavims identified so far, only three (DVTM, MI, 
and -2) possess a type I HE. It is tempting to speculate that type I 
MuCoVs represent an ancestral biotype that is gradually being 
replaced by type II. However, our knowledge of MuCoV diversity 
in nature is limited and restricted to a relatively small number of 
laboratory isolates mostly from mice (Mus musculus domesticus) 
and rats (Rattus norvegicus) kept in animal facilities. We have little 
to no understanding of the complexity and interspecies diversity of 
the sialomes in naturally occurring murids or in other mammals for 
that matter. It is in the unraveling of how such factors might direct 
virus evolution that a next challenge lies. 

Materials and Methods 

Expression and Purification of CoV HEs. Human codon-optimized sequences for 
the HE ectodomains of RCoV-NJ (residues 22-400) and MHV-DVIM (residues 24- 
395) were cloned in expression plasmid pCD5-T-Fc (19). The resulting constructs 
code for chimeric HE proteins that (/') are provided with a CD5 signal peptide, 
and, C-terminally, with a thrombin cleavage site and the human IgGI Fc do¬ 
main, and that (/'/) are either enzymatically active (HE-Fc) or rendered inactive 
through catalytic Ser-to-Ala substitution (HE°-Fc). Site-specific mutations were 
introduced by Q5 PCR mutagenesis (New England Biolabs). For receptor de¬ 
struction esterase assays, HEs were produced by transient expression in HEK293T 
cells and purified from cell culture supernatants by protein A-affinity chroma¬ 
tography and low-pH elution as described (19). For crystallization, HEs were 
transiently expressed in HEK293 GnTI(-) cells (49), and the ectodomains were 
purified by protein A-affinity chromatography and on-the-beads thrombin 
cleavage as described (19). Purified HEs were concentrated to 5-10 mg/mL, and, 
in the case of RCoV-NJ, deglycosylated by the addition of 1 MU/mL EndoH F (New 
England Biolabs), and incubated for 1 h at room temperature before the setup 
of crystallization experiments. 

Crystallization and X-Ray Data Collection. MHV-DVIM HE crystals with P2i2i2i 
space group were grown at 20 °C using sitting-drop vapor diffusion against a 
well solution containing 0.1 M Tris HCI, pH 8.0, 0.05 M NaF, 16% (wt/vol) 
PEG3350, and 10% (vol/vol) glycerol. RCoV-NJ HE 0 crystals with C222 t space 
group grew against two different well solutions: 0.1 M Bis-Tris propane, pH 7.5, 

0.2 M NaF, and 20% (wt/vol) PEG3350; and 0.1 M Hepes, pH 7.5, 0.2 M NaCI, and 
20% (wt/vol) PEG3000. The structure of RCoV-NJ HE without ligand was 
obtained from the first condition, and the structure of RCoV-NJ HE in complex 
with receptor analog was obtained from crystals grown in the latter condition. 

These latter crystals were, before flash-freezing, soaked for 10 min at 4 °C in 
cryoprotectant containing 100 mM 4,5-di-/V-acetylneuraminic acid oc-methyl- 
glycoside (for the synthesis of this compound, see 5/ Materials and Methods and 
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Fig. S7). Crystals were cryoprotected in well solution containing 20% (RCoV-NJ) 
or 12.5% (MHV-DVIM) (vol/vol) glycerol before flash-freezing in liquid nitrogen. 
Diffraction data of MHV-DVIM was integrated with Eval15 (50) and diffraction 
data of RCoV-NJ was integrated with Mosflm (51). Integrated diffraction data 
were further processed using the CCP4 package (52). The structures of RCoV-NJ 
HE and MHV-DVIM HE were solved by molecular replacement using the HE 
structure from MHV-S [(PDB ID code 4C7L (21)] and BCoV-Mebus [(PDB ID code 
3CL5 (19)] as search models, respectively. Models were refined using REFMAC 
(53) alternated with manual model improvement using COOT (54). Refinement 
procedures included TLS refinement using either one (RCoV-NJ HE) or three TLS 
groups per molecule (MHV-DVIM HE). For RCoV-NJ HE 0 free, /? work and /?f ree 
had final values of 23.2% and 27.8%. For RCoV-NJ HE 0 complexed with 
4,5-di-A/-Ac-Sia, /? work and /?f ree had final values of 18.5% and 20.3%. For 
MHV-DVIM HE, these values were 19.9% and 22.6%, respectively. Statistics 
of data processing and refinement are listed in Table 1. 

X-Ray Fluorescence Measurements. X-ray absorption spectra were recorded 
from RCoV HE crystals on European Synchrotron Radiation Facility (ESRF) 
beamline ID29 in fluorescence mode using a Rontec Xflash X-ray fluorescence 
detector. The X-ray energy was scanned around the Zn K-edge {X = 1.28 A; 
energy = 9,668 eV). 

Molecular Docking. Molecular docking of 9-0- and 4-O-Ac-Sia in the crystal 
structures of MHV-DVIM HE and RCoV-NJ HE, respectively, was performed with 
AutoDock4 (55). The Sia molecules used for docking were extracted from BCoV 
HE (PDB ID code 3CL5; for 9-O-Ac-Sia) and from MHV-S HE (PDB ID code 4C7W; 
for 4-O-Ac-Sia). Ligand files were processed with AutoDockTools. During 
docking, the protein was considered to be rigid. This assumption is justified by 
the observation that binding of substrate analogs in the crystal structures of 
HEF and RCoV-NJ HE does not induce conformational changes, except that in 
HEF, a rotation of the active-site Ser side chain was observed (39). Active-site 
Ser 44 in MHV-DVIM HE already adopts the "active" rotamer observed in HEF; 
for RCoV-NJ HE, which was crystallized as an inactive Ser-to-Ala mutant, a Ser 
side chain with active rotamer was introduced using COOT. We used an 
inverted Gaussian function (50-A half-width; 15-kJ energy at infinity) to restrain 
the O-acetyl carbonyl oxygen in the oxyanion hole at a position occupied by a 
water molecule in the respective crystal structures. The carbonyl oxygen must 
be located close to this position to enable charge stabilization of the negatively 
charged tetrahedral reaction intermediate, which is a critical step in the well- 
established reaction mechanism (39, 40). To reproduce the observed binding 
modes of substrate (analogs) in the active site of HEF and the lectin domains of 
RCoV-NJ HE and BCoV HE, it proved necessary to constrain the torsion angles 
internal to the glycerol moiety to values observed in the RCoV-NJ HE and BCoV- 
Mebus HE complexes. These values are very similar in both HE complexes as well 
as in numerous other Sia-protein complexes in the PDB. The initial ligand 
conformation was randomly assigned and 10 docking runs were performed. 
The method was validated by docking 9-O-Ac-Sia in the MHV-DVIM HE struc¬ 
ture, which gave a mode of binding essentially identical to that of the substrate 
molecule from HEF superimposed on the MHV-DVIM HE structure (Fig. 2 C and 
D), and, by docking 4-O-Ac-Sia in the RCoV-NJ HE structure, which gave an 
identical mode of binding for the 10 lowest energy solutions, which were es- 
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sentially identical to that observed for the crystal complex with 4-A/-Ac-Sia (Fig. 
3 A and Fig. S8). 

Receptor Destruction Esterase Assay. The enzymatic activity of MHV-DVIM and 
RCoV-NJ HE toward O-acetylated Sias was measured as described (21). Briefly, 
MaxiSorp 96-well plates (Nunc), coated for 16 h at 4 °C with 100 pL of HSGs 
(undiluted; TCS Biosciences) or BSM (1 pg/mL; Sigma), were treated with twofold 
serial dilutions of enzymatically active HE (starting at 100 ng/pL in PBS, unless 
stated otherwise in the figure legend) for 1 h at 37 °C. Depletion of O-Ac-Sia 
was determined by solid-phase lectin-binding assay (8, 21) with lectin concen¬ 
trations fixed at half-maximal binding (MHV-S HE°-Fc, 5 pg/mL, for 4-O-Ac-Sia; 
PToV-P4 HE°-Fc, 1 pg/mL, for 9-O-Ac-Sia). Incubation was for 1 h at 37 °C; un¬ 
bound lectin was removed by washing three times, after which bound lectin 
was detected using an HRP-conjugated goat anti-human IgG antiserum 
(Southern Biotech) and TMB Super Slow One Component HRP Microwell Sub¬ 
strate (BioFX) according to the instructions. The staining reaction was termi¬ 
nated by addition of 12.5% (vol/vol) H 2 S0 4 and the optical density was 
measured at 450 nm. Graphs were constructed using GraphPad (GraphPad 
Software). All experiments were repeated as biological replicates at least two 
times and each time in technical triplicate, yielding identical results. 

pNPA Assay. 4-Nitrophenyl acetate (pNPA) yields a chromogenic p-nitro- 
phenolate anion (pNP) upon hydrolysis, which can be monitored at 405 nm. 
HE-Fc esterase activity toward pN PA was measured essentially as described 
(56). Briefly, 50 ng of HE was incubated with 1 mM pNPA in PBS and the 
amount of pNP was determined spectrophotometrically at 405 nm every 20 s 
for 15 min. Specific activity was defined as product yield/mass of enzyme 
(micromolar pNP per microgram of HE) and subsequently expressed as a per¬ 
centage of wild-type HE activity. 

O-Ac-Sia Expression in Mouse Colon. Tissue stainings were performed as de¬ 
scribed (8). In short, paraffin-embedded colon sections (Gentaur; AMS541) 
were dewaxed in xylene and rehydrated. 4- and 9-O-Ac-Sias were detected by 
incubating with MHV-S and PToV-P4 HE°-mFc virolectins, respectively, and 
subsequently incubated with biotinylated goat-a-mouse IgG antibodies 
(Sigma-Aldrich; 1:250), with avidin-biotin HRPO complex (ABC-PO staining kit; 
Thermo Scientific), and with 3,30-diaminobenzidine (DAB) (Sigma-Aldrich). 
Counterstaining was done with Mayer's hematoxylin; tissue sections were 
embedded in Eukitt mounting medium (Fluka) and examined by standard 
light microscopy. 
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